首页> 外文OA文献 >Neural Machine Translation of Rare Words with Subword Units
【2h】

Neural Machine Translation of Rare Words with Subword Units

机译:用字词单位神经机器翻译稀有词

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Neural machine translation (NMT) models typically operate with a fixed vocabulary, so the translation of rare and unknown words is an open problem. Previous work addresses this problem through back-off dictionaries. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English→German and English→Russian by 1.1 and 1.3 BLEU, respectively.
机译:神经机器翻译(NMT)模型通常以固定的词汇量运行,因此稀有单词和未知单词的翻译是一个未解决的问题。先前的工作通过备用字典解决了这个问题。在本文中,我们基于各种词类可通过比词小的单位进行翻译的直觉,使NMT模型能够通过将稀有词和未知词编码为子词单元序列,从而使NMT模型能够进行词汇翻译例如名称(通过字符复制或音译),复合词(通过构成翻译)以及同源词和借词(通过语音和词法转换)。我们讨论了不同的分词技术的适用性,包括简单字符n-gram模型和基于字节对编码压缩算法的分词,并通过经验证明子词模型在WMT 15翻译任务的基础上比后退字典基线有所改进。 →德语和英语→俄语分别为1.1和1.3 BLEU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号